OK, let me propose an even more "interesting" scenario: you are in a room with arbitrary properties. I do not specify what any of those properties are. Now you can't answer any question about the room definitively! Isn't that much more interesting?
That is in fact one of Borges's themes:
Borges translation linked in my first post wrote:
(I know of an uncouth region whose librarians repudiate the vain and superstitious custom of finding a meaning in books and equate it with that of finding a meaning in dreams or in the chaotic lines of one's palm ... They admit that the inventors of this writing imitated the twenty-five natural symbols, but maintain that this application is accidental and that the books signify nothing in themselves. This dictum, we shall see, is not entirely fallacious.)
"But transportation issues are social-justice issues. The toll of bad transit policies and worse infrastructure—trains and buses that don’t run well and badly serve low-income neighborhoods, vehicular traffic that pollutes the environment and endangers the lives of cyclists and pedestrians—is borne disproportionately by black and brown communities."
CatharzGodfoot wrote:As long as there is enough repetition within the message, a Huffman tree encoding will generate a smaller output regardless of the number of possibly meaningful substrings.
Um...yes, the algorithm compresses repetitive messages, so as long as the message is repetitive, it will get shorter. If you intended some more insightful point than that, I'm afraid it has eluded me.
You seemed to be indicating that the set of all possible meaningful inputs had some influence on the effectiveness of an encryption algorithm. In the case of a Huffman code, that isn't true. If I misunderstood, I apologize.
Manxome wrote:
CatharzGodfoot wrote:It's true that encoding the alphabet with a Huffman tree lands you a longer message than you started with, but for messages significantly longer than the size of the alphabet you'll never end up with anything more than a constant amount larger than the original.
Is there a proof for that? While it is possible that the asymptotic behavior never adds more than a constant amount to the length of the message, it's not at all obvious to me that this is the case for this algorithm.
For example, a string of the form "ABACADAEAF....AYAZBCBDBE..." is substantially longer than the alphabet, but still compresses extremely poorly (no repeated bigrams). I'm not 100% certain that you could find a similar abusive pattern for arbitrarily long strings, but I don't see any obvious reason that you couldn't.
And even if this claim is correct, it doesn't change the fact that the average compression ratio across all possible messages under a fixed length can't be better than 1. It will just mean that for every string that attains a linear (rather than constant) compression, there will be a linear number of strings that get a constant amount worse, to maintain the average.
Not ignoring you, but I'll need to think about that a bit...
CatharzGodfoot wrote:You seemed to be indicating that the set of all possible meaningful inputs had some influence on the effectiveness of an encryption algorithm. In the case of a Huffman code, that isn't true. If I misunderstood, I apologize.
It influences it to the extent that the entropy http://en.wikipedia.org/wiki/Entropy_(i ... on_theory) of the source material puts a maximum limit on its compressibility (no matter what algorithm you use), and if all messages are equally likely, that limit is "none."
(Sorry about the funky link, this forum software throws a fit if you try to create a url tag with parentheses in the address.)
By way of sanity check, I'll say that the above displays in my browser in code form--there is no clickable hyperlink, and the innards of the url tag are visible. If it works for you, then there's something really weird going on.
The tinyurl version in Crissa's edit works, of course (no weird chars in the url).
The really interesting error was when I just enclosed the address in url tags instead of using the url= syntax. That resulted in my entire post appearing totally blank (at least in the preview).